The Problem with Noise and Small Disjuncts
نویسندگان
چکیده
Many systems that learn from examples express the learned concept as a disjunction. Those disjuncts that cover only a few examples are referred to as small disjuncts. The problem with small disjuncts is that they have a much higher error rate than large disjuncts but are necessary to achieve a high level of predictive accuracy. This paper investigates the effect of noise on small disjuncts. In particular, we show that when noise is added to two real-world domains, a significant, and disproportionate number of the total errors are contributed by the small disjuncts; thus, in the presence of noise, it is the small disjuncts that are primarily responsible for the poor predictive accuracy of the learned concept.
منابع مشابه
Learning with Rare Cases and Small Disjuncts
Systems that learn from examples often create a disjunctive concept definition. Small disjuncts are those disjuncts which cover only a few training examples. The problem with small disjuncts is that they are more error prone than large disjuncts. This paper investigates the reasons why small disjuncts are more error prone than large disjuncts. It shows that when there are rare cases within a do...
متن کاملConcept Learning and the Problem of Small Disjuncts March
Ideally de nitions induced from examples should consist of all and only disjuncts that are meaningful e g as measured by a statistical signi cance test and have a low error rate Existing inductive systems create de nitions that are ideal with regard to large disjuncts but far from ideal with regard to small disjuncts where a small large disjunct is one that correctly classi es few many training...
متن کاملA hybrid decision tree/genetic algorithm for coping with the problem of small disjuncts in data mining
The problem of small disjuncts is a serious challenge for data mining algorithms. In essence, small disjuncts are rules covering a small number of examples. Due to their nature, small disjuncts tend to be error prone and contribute to a decrease in predictive accuracy. This paper proposes a hybrid decision tree/genetic algorithm method to cope with the problem of small disjuncts. The basic idea...
متن کاملA Quantitative Study of Small Disjuncts
Systems that learn from examples often express the learned concept in the form of a disjunctive description. Disjuncts that correctly classify few training examples are known as small disjuncts and are interesting to machine learning researchers because they have a much higher error rate than large disjuncts. Previous research has investigated this phenomenon by performing ad hoc analyses of a ...
متن کاملThe Impact of Small Disjuncts on Classifier Learning
Many classifier induction systems express the induced classifier in terms of a disjunctive description. Small disjuncts are those disjuncts that classify few training examples. These disjuncts are interesting because they are known to have a much higher error rate than large disjuncts and are responsible for many, if not most, of all classification errors. Previous research has investigated thi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1998